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Foreword 



The present document describes tools used in the Enhanced aacPlus general audio codec for the general audio service 
within the 3GPP system. 

The contents of the present document are subject to continuing work within the TSG and may change following formal 
TSG approval. Should the TSG modify the contents of this TS, it will be re-released by the TSG with an identifying 
change of release date and an increase in version number as follows: 

Version x.y.z 

where: 

x the first digit: 

1 presented to TSG for information; 

2 presented to TSG for approval; 

3 Indicates TSG approved document under change control. 

y the second digit is incremented for all changes of substance, i.e. technical enhancements, corrections, 
updates, etc. 

z the third digit is incremented when editorial only changes have been incorporated in the specification; 
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Scope 



This Telecommunication Standard (TS) describes the error concealment algorithm, SBR parameter downmix and output 
resampling for the Enhanced aacPlus general audio codec [3]. 



Normative references 



This TS incorporates by dated and undated reference, provisions from other publications. These normative references 
are cited in the appropriate places in the text and the publications are listed hereafter. For dated references, subsequent 
amendments to or revisions of any of these publications apply to this TS only when incorporated in it by amendment or 
revision. For undated references, the latest edition of the publication referred to applies. 

[1] ISO/IEC 14496-3:2001/Amd. 1:2003: "Bandwidth Extension". 

[2] ISO/IEC 14496-3:2001/Amd.l:2003/DCOR1. 

[3] 3GPP TS 26.401: "Enhanced aacPlus general audio codec; General Description". 

3 Definitions, symbols and abbreviations 

3.1 Definitions 

For the purposes of this TS, the following definitions apply: 

band: (as in limiter band, noise floor band, etc.) a group of consecutive QMF subbands 

envelope scalefactor: an element representing the averaged energy of a signal over a region described by a 
frequency band and a time segment 

frequency band: interval in frequency, group of consecutive QMF subbands 

frequency border: frequency band delimiter, expressed as a specific QMF subband 

noise floor: a vector of noise floor scalefactors 

noise floor scalefactor: an element associated with a region described by a frequency band and a time segment, 
representing the ratio between the energy of the noise to be added to the envelope adjusted HF 
generated signal and the energy of the same 

SBR envelope: a vector of envelope scalefactors 

SBR frame: time segment associated with one SBR extension data element 

SBR range: the frequency range of the signal generated by the SBR algorithm 

subband: a frequency range represented by one row in a QMF matrix, carrying a subsampled signal 

time border: time segment delimiter, expressed as a specific time slot 

time segment: interval in time, group of consecutive time slots 

time / frequency grid: a description of SBR envelope time segments and associated frequency resolution tables as 
well as description of noise floor time segments 

time slot: finest resolution in time for SBR envelopes and noise floors. One time slot equals two subsamples in the 
QMF domain 
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3.2 Symbols 

For the purposes of this TS, the following symbols apply: 

E „g has L E columns where each column is of length Nu, w or Nnigh depending on the frequency 

resolution for each SBR envelope. The elements in Eo, g contains the envelope scalefactors of the 
original signal. 

Fs out the output sampling rate from the SBR Tool. 

Fs SBR internal sampling frequency of the SBR Tool, twice the sampling frequency of the core coder 

(after sampling frequency mapping, ISO/IEC 14496-3:2001, Table 4.55). The sampling frequency 
of the SBR enhanced output signal is equal to the internal sampling frequency of the SBR Tool, 
unless the SBR Tool is operated in downsampled mode. If the SBR Tool is operated in 
downsampled mode, the output sampling frequency Fs out is equal to the sampling frequency of the 
core coder. 

F = ^TaMeLow^Tabiemeh rias two c °l umn vectors containing the frequency border tables for low and high frequency 
resolution. 

frabieHigh is of length Nfflgh+l and contains frequency borders for high frequency resolution SBR envelopes. 

frabieLow is of length Nlow+1 and contains frequency borders for low frequency resolution SBR envelopes. 

Le number of SBR envelopes. 

Lq number of noise floors. 

Nq number of noise floor bands. 

n = [ N Low , N m h ] number of frequency bands for low and high frequency resolution. 

numTime Slots number of SBR envelope time slots that exist within an AAC frame, 16 for a 1024 AAC frame and 
15 for a 960 AAC frame. 

panOffset = [24,12] offset-values for the SBR envelope and noise floor data, when using coupled channels. 
Qong has Lq columns where each column is of length Nq and contains the noise floor scalefactors. 

r = [r ,...,r L1 ] frequency resolution for all SBR envelopes in the current SBR frame, zero for low resolution, one 
for high resolution. 

tg is of length L^+l and contains start and stop time borders for all SBR envelopes in the current 

SBR frame. 

tg is of length Lq+\ and contains start and stop time borders for all noise floors in the current SBR 

frame. 

Y is the complex output QMF bank subband matrix from the HF adjuster. 

3.3 Abbreviations 

For the purposes of this TS, the following abbreviations apply. 

AAC Advanced Audio Coding 

aacPlus Combination of MPEG-4 AAC and MPEG-4 Bandwidth extension (SBR) 

Enhanced aacPlus Combination of MPEG-4 AAC, MPEG-4 Bandwidth extension (SBR) and MPEG-4 

Parametric Stereo 
MPEG Moving Picture Experts Group 

SBR Spectral Band Replication 
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4 Outline description 

This TS is structured as follows: 

Section 5 gives a detailed description of the error concealment algorithms in the Enhanced aacPlus decoder. In Section 
5.1 the error concealment of the AAC is described, and in section 5.2 the error concealment of the SBR algorithm is 
outlined. 

Section 6 gives a detailed description of how stereo SBR parameters are down mixed to mono SBR parameters. 

Section 7 gives a detailed description of the additional downsampler tool, enabling the Enhanced aacPlus codec to give 
output sampling rates of 8 and 16kHz, disregarded the sampling rate used for the coded signal. 



5 Error concealment 

5.1 AAC error concealment 

The AAC core decoder includes a concealment function that increases the delay of the decoder by one frame. 

There are various tests inside the core decoder, starting with simple CRC tests and ending in a variety of plausibility 
checks. If such a check indicates an invalid bitstream, then concealment is applied. Concealment is also applied when 
the calling main program indicates a distorted or missing data frame using the frameOK flag. This is used for error 
detection on the transport layer. 

Concealment works on the spectral data just before the final frequency to time conversion. In case a single frame is 
corrupted, concealment interpolates between the last good and the first good frame to create the spectral data for the 
missing frame. Always the previous frame will be processed by the frequency to time conversion, so here the missing 
frame to be replaced is the previous frame, the last good frame is the frame before the previous one and the first good 
frame is the actual frame. If multiple frames are corrupted, concealment implements first a fade out based on slightly 
modified spectral values from the last good frame. As soon as good frames are available, concealment fades in the new 
spectral data. 

Interpolation of one corrupt frame: 

In the following the actual frame is frame number n, the corrupt frame to be interpolated is the frame n-\ and the last 
but one frame has the number n-2. 

The determination of window sequence and the window shape of the corrupt frame follows from the table below: 
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Table 1 : Interpolated window sequences and window shapes 



window sequence n-2 


window sequence n 


window sequence n-1 


window shape n-1 


ONLY_LONG_SEQUENCE 

or 
LONG_START_SEQUENCE 

or 
LONG_STOP_SEQUENCE 


ONLY_LONG_SEQUENCE 

or 
LONG_START_SEQUENCE 

or 
LONG_STOP_SEQUENCE 


ONLY_LONG_SEQUENCE 





ONLY_LONG_SEQUENCE 

or 
LONG_START_SEQUENCE 

or 
LONG_STOP_SEQUENCE 


EIGHT_SHORT_SEQUENCE 


LONG_START_SEQUENCE 


1 


EIGHT_SHORT_SEQUENCE 


EIGHT_SHORT_SEQUENCE 


EIGHT_SHORT_SEQUENCE 


1 


EIGHT_SHORT SEQUENCE 


ONLY_LONG_SEQUENCE 

or 
LONG_START_SEQUENCE 

or 
LONG_STOP_SEQUENCE 


LONG_STOP_SEQUENCE 






The scalefactor band energies of frames n-2 and n are calculated. If the window sequence in one of these frames is an 
EIGHT_SHORT_SEQUENCE and the final window sequence for frame n-1 is one of the long transform windows, the 
scalefactor band energies are calculated for long block scalefactor bands by mapping the frequency line index of short 
block spectral coefficients to a long block representation. The new interpolated spectrum is built by reusing the 
spectrum of the older frame n-2 multiplying a factor to each spectral coefficient. An exception is made in the case of a 
short window sequence in frame n-2 and a long window sequence in frame n, here the spectrum of the actual frame n is 
modified by the interpolation factor. This factor is constant over the range of each scalefactor band and is derived from 
the scalefactor band energy differences of frames n-2 and n. Finally the sign of the interpolated spectral coefficients will 
be flipped randomly. 

Fade out and in: 

A complete fading out takes 5 frames. The spectral coefficients from the last good frame are copied and attenuated by a 
factor of: 



fadeOutFac = 2 



■(nFade Out Frame IT) 



with nFadeOutFrame as frame counter since the last good frame. 

After 5 frames of fading out the concealment switches to muting, that means the complete spectrum will be set to 0. 

The decoder fades in when receiving good frames again. The fade in process takes 5 frames, too and the factor 
multiplied to the spectrum is: 

fadelnFac = 2- (5 - nFadeInFmme)l2 

where nFadelnFrame is the frame counter since the first good frame after concealing multiple frames. 

5.2 SBR error concealment 

The SBR error concealment algorithm is based on using previous envelope and noise-floor values with an applied 
decay, as a substitute for the corrupt data. In the flowchart of Figure 1 the basic operation of the SBR error concealment 
algorithm is outlined. If the frame error flag is set, error concealment bitstream data is generated to be used instead of 
the corrupt bitstream data. The concealment data is generated according to the following. 

The time frequency grids are set to: 
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L E =\ 



*•£ (0) = t ' £ ( L ' £ ) — numTimeSlots 
t E (l) = numTimeSlots 
r(l) = HI ,0<1<L E 
bs_pointer = 



L Q =\ 



t G =[t £ (o),t £ (i)] 



The delta coding direction for both the envelope data and noise-floor data are set to be in the time-direction. The 
envelope data is calculated according to: 



E 

'-'Delta 



(k,l) = \- SteP >^(kJ)>target ^^ k<n{r{l)) ^^ l<LE 
[step , otherwise 



where 



[2 ,if bs_amp_res = \ 
step = < 

[1 ,otherwise 

I panOffset ( bs _ amp _ res ) , if bs _ coupling = 1 
target = < 

[0 otherwise 

And where bs_amp_res and bs_coupling are set to the values of the previous frame. 
The noise floor data is calculated according to: 

Furthermore, the inverse-filtering levels in bs_invf_mode are set to the values of the previous frame, and all elements in 
bs_add_harmonic are set to zero. 

If the frame error is not set, the present time grid and envelope data may need modification if the previous frame was 
corrupt. If the previous frame was corrupt the time grid of the present frame is modified in order to make sure that there 
is a continuous transition between the frames. The envelope data for the first envelope is modified according to: 

E mod (k,0) = E(k,0) + a-\og 2 \ M1)-M0) 1 <*<F(r(/),0) 

t £ (1J — estimated _ start _ pos 

where 

estimated _ start _ pos = t ' E (L ' e ) — numberTimeSlots . 

After the delta coded data has been decoded, a plausibility check is performed to make sure that the decoded data is 
within reasonable limits. The required limits are: 

For the envelope data the logarithmic values shall fulfil: 

[ 35 , ampRes — 

E(M)< 

1 70 , ampRes - 1 
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otherwise the frame will be considered corrupt. 

The time grids are also verified according to the following rules (if any of the below is true the frame is considered to be 
corrupt): 



L E <\ 




L E >5 




L Q >2 




t £ (0)<0 




t E (0)>t E (L E ) 




t £ (0)>3 




t E (L E )<\6 




t E {L E )>\9 




t E (l)>t E (l + l), 


0<l<L t 


l A >L E 




L E = 1 AND L Q > 


1 



- t G (o)^t £ (o) 

- t Q (L Q )*t E {L E ) 

- t G (/)>t e (/ + l),0</<L e 

all elements of t e are not among the elements of t E 
If the plausibility check fails, the frame error flag is set and the error concealment outlined above is applied. 
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Yes- 



Generate 
-> Concealing control 
data 



timeCompensate- 
FirstEnvelope 



coupling = 
prevCoupling 



deltaToLinear- 
PCMEnvelope- <- 
Decoding 



addConcealing- 
EnvelopeData 



deltaToLinear- 

PCMEnvelope- 

Decoding 



set frame error 
flag 



-Yes 




Figure 1 : SBR error concealment overview 



6 SBR stereo parameter to mono parameter downmix 

This module enables a decoder only capable of mono output to downmix stereo SBR parameters to mono parameters, 
hence mono decoding can be performed instead of a stereo decoding. 

For all the descriptions in this section left and right applies to the two stereo channels that are being mapped to one 
channel from here on called merged. 

6.1 Inverse filtering 

The inverse filtering is mapped from stereo to mono by taking the maximum value from left or right as shown below: 

for (n = 0;n<N Q ;n ++) 

bs_sbr_invf_mode Merged (n) = MAX \bs_sbr_invf_mode Left (n),bs_sbr_invf_mode Right (n)j 

6.2 Additional harmonics 

The additional sinusoids in the merged mono bitstream are the union of the additional sinusoids present in left or right 
channel as shown below. 
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for (it = 0;n<N High ;n + +) 

bs_add_harmonic Merged (n) = OR\bs_add_harmonic Left (n) ,bs_add_harmonic Right (n)J 

6.3 Envelope time borders 

The merging of the time envelope grid is explained by using t E which contains all start/stop borders for all SBR 
envelopes in the current SBR frame, and the creation of t E from the bitstream is explained in [1]. 

Merging the two t E into one is done in the following sequence 

1 . The start border value of the merged t E is set to the largest of the start border values for the left and right 
channels. 

2. The union of all borders for left and right channel values larger than the merged start border is added to the 
merged t E 

3. All envelopes smaller than two time slots are removed from the merged t E . This is achieved by starting from 
the beginning of t E and remove all stop borders that are closer than 2 from the start border. 

4. If there are more than 5 envelopes the number of envelopes are reduced. This is achieved by starting from the 
end of t E and search towards the beginning of t E for a envelope smaller than 4 and remove the start border of 
that envelope, this continues until there are 5 envelopes left. 

The index l A defined in Table 4.1 19 in [1] has to be merged into one from two. The merging algorithm uses the 
following recursion: 

if (Lu ft =-i) 



h 


Merged A 


Jtight 




else 








If 


y AJiight 




-i) 




A_Merged 


= h 


Left 


else 








A_Merged 


- mini 



yA_Left^A_Right) 



6.4 Noise time borders 



The number of noise floors L Q and the start/stop borders for the noise t Q is calculated according to [1] for left and 
right channel and then merged as shown below. 
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Lq Merged = MAX { L Quft , L QRight j 
if{ L Q_Merged> 1 ) 

QJierged V"J — * EJAerged [y ) 
*-Q_Merged V J — * EJAerged \ '-'EJAerged ) 
y \ ^Q_Left < L Q _R ight ) 

^QJAerged V J — Q_ Right V 7 

V I ^QRight < M2_Le/* J 

else 

^Merged W = ^(t^ (l), W (l)) 

tgjl&rgerf V^j — * E_Merged \" ) 
*-QJAerged V J — ''BJVferjed \ EJAerged j 

After this is done. It is possible that the middle noise floor border dose not coincide with any SBR envelope time 
border. If this is the case the middle noise floor border is set equal to the closest time envelope border in the upwards 
direction. 

6.5 Envelope data 

Before merging the actual data the per envelope frequency selection of high and low frequency resolution is merged as 
shown below. 



Merged 



(0 = 



,r Left (h Left {l)) = AND r Right (h Right {l)) = 

>U Si < L EMerged 

1 ,otherwise 



where h Righ ,(l) is defined by t ERight (h Right (l))<t EMerged (l)<t ERight (h Right (l) + \) and h Left (l) is defined by 

*1U* (Kft (0) ^ l E Merged (0 < *EUfi { h Lef, (0 + 1) 

The envelope data referred to as E . in [1] for the left and right channels are merged into E . M d as shown 
below. 

17 (1 1\ ^WlOrig{8Left{' C )'KeftV)) + ^'RightOrig{gRighl{' l )'' 1 RightV)) A ^,^„/ /a\ n .,^ f 

E Ong_Merged (*»0 = " ,0 < k < II (l^ (/) j,0 < / < L £ _ Merged 

where fc ffigfa (/) is defined by t £JHgto (h Right (/)) <t £Mer?ed (/) <t £R!?fa (h Right (l) + l) and 

S** (*) ^ defined by F(g Righl (k),r Righl (h Righl (/))) < F(&,r Merged (/)) < F( gRighl (k) + U mght (h Right (/))) and 

h Uft (l) is defined by t E ^,{h Lefl (l))<t EMerged (l)<t ELefl (h Lefl (l) + l) and 

g Lefl (k) is defined by F(g Uft (k),^ (h uft (/))) < F(*,r^ (/)) < F( gufi (k) + \,x Uft [h uft (/))) . 
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6.6 



Noise floor data 



The noise floor data is merged as the average between left and right channel data according to the function below. 



E 



Orig_Merged 



(*,/) = - 



'-LeftOrig 



(kAeft ( l )) + QRi g htOri g ( k > h m g h, (0) 



,0<k<N Q ,0<KL Q _ Merged 



where h Right (l) is defined by t QRight (h Rjghl (l))<t QMerged (l)<t QRight (h Right (l) + \) and h Left {l) is defined by 

W ( h Left (0) ^ t QJh v d (0 < tflLWJ (^ (0 + • 



Output resampler tool 



The audio output from an Enhanced aacPlus decoder using a downsampled synthesis filterbank (see 4.6.18.4.3 or 
4.6.18.8.2.3 of [1]) is given at the AAC decoder sampling rate Fs . If an output with lower sampling rate is desired, a 
downsampler tool is required in order to convert the audio output from the source sampling frequency Fs oul to the 
target sampling frequency Ft < Fs out . 

The downsampler tool consists of three parts connected to the final operations of the SBR decoder as in Figure 1 . 
(Refering to Figure 4.44 of [1]). The QMF bandlimiter operates on QMF samples and is inserted prior to the QMF 
synthesis bank, the spline resampler and the postfilter operate on synthesised PCM audio samples. 



Analysis 
QMF Bank 




Envelope 
Adjuster 














QMF 
Bandlimiter 
















i 


' 






Synthesis 
QMF Bank 





Spline 
Resampler 



Postfilter 



▼ 
Output 

Figure 1 : Downsampler tool 



7.1 



QMF bandlimiter 



The QMF bandlimiter maps the final QMF matrix Y to Y B , according to the rule 
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Y(M), 0<k<k B ; 
Y„(k,l) = < >, 0<l <numTimeSlots-2, 

B |0, k<k <32, 



where the cutoff frequency subband index k B is given by 



k B = INT 



Ft 



64- 
Fi 

V L A SBR J 



7.2 Spline resampler 



Given the discrete time output x(l) from synthesis QMF bank the spline resampler relies on the continuous time signal 
representation, 

sitJ^xiDpiFs^-t-l), 

(=0 

where (p(t) is a cubic B-spline supported on the interval [0,3] .The discrete time output of the resampler is then 
defined, up to a suitable delay, by y(n) = s(n I Ft) . For each n > , define the integer m(n) and the 
remainder < p(n) < 1 by 



Fs„ 



Ft 



-n = m(n) + p(n). 



Then the spline evaluation results in the time varying causal filtering, 

3 

y(n) = ^B(p(n),d) x(m(n) — d), n =0,1,..., 

rf=0 

with homogeneous initialization x(l) = 0, / < 0, and filter coefficients 



B(p,d) = < 



ip\ 


d=0; 


\{-p'+p 2 +p)+\, 


d=\; 


\p'-p 2 +h 


d = 2; 


l(l-/>) 3 > 


d = 3. 



7.3 



Postfilter 



In order to compensate for a small loss in signal power for high frequencies, a first order IIR postfilter is applied to the 
output y(n) of the spline resampler, resulting in the final output z(n) , where z(— 1) = and 

z{n) = y(n) + a{y(n)-z{n-\)), n = 0,l,.... 
The value of the parameter OC is given by Table 1 . 
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Table 1 : Postfilter coefficient a 



Kou, 


Ft 


8 kHz 


16 kHz 


22.05 kHz 


0.06 


0.30 


24 kHz 


0.05 


0.24 
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Change history 



Change history 
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0001 




Correction of written specification: AAC concealment 
fade out factor 
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